{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "O8uprHQvp2t5"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_10_5_keras_transformers.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "yUDaVUCyp2t7"
   },
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "**Module 10: Time Series in Keras**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "x2WptkYKp2t7"
   },
   "source": [
    "# Module 10 Material\n",
    "\n",
    "* Part 10.1: Time Series Data Encoding for Deep Learning [[Video]](https://www.youtube.com/watch?v=dMUmHsktl04&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_10_1_timeseries.ipynb)\n",
    "* Part 10.2: Programming LSTM with Keras and TensorFlow [[Video]](https://www.youtube.com/watch?v=wY0dyFgNCgY&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_10_2_lstm.ipynb)\n",
    "* Part 10.3: Text Generation with Keras and TensorFlow [[Video]](https://www.youtube.com/watch?v=6ORnRAz3gnA&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_10_3_text_generation.ipynb)\n",
    "* Part 10.4: Introduction to Transformers [[Video]](https://www.youtube.com/watch?v=Z7FIdKVQ7kc&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_10_4_intro_transformers.ipynb)\n",
    "* **Part 10.5: Transformers for Timeseries** [[Video]](https://www.youtube.com/watch?v=SX67Mni0Or4&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_10_5_keras_transformers.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0Wj906IRp2t8"
   },
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running the correct version of TensorFlow.\n",
    "  Running the following code will map your GDrive to ```/content/drive```."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "sAdYRJcvp2t8",
    "outputId": "199ce5c7-f601-475c-b80a-fbb9d9740e5c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mounted at /content/drive\n",
      "Note: using Google CoLab\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    from google.colab import drive\n",
    "    drive.mount('/content/drive', force_remount=True)\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "    %tensorflow_version 2.x\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "riFjHKOU22aO"
   },
   "source": [
    "# Part 10.5: Programming Transformers with Keras"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "14QUyFpKp2uA"
   },
   "source": [
    "This section shows an example of a transformer encoder to predict sunspots.  You can find the data files needed for this example at the following location.\n",
    "\n",
    "* [Sunspot Data Files](http://www.sidc.be/silso/datafiles#total)\n",
    "* [Download Daily Sunspots](http://www.sidc.be/silso/INFO/sndtotcsv.php) - 1/1/1818 to now.\n",
    "\n",
    "The following code loads the sunspot file:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "u12-bjaOp2uA",
    "outputId": "53efb24a-60ec-4eff-8e6b-10cfbae533f0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Starting file:\n",
      "   year  month  day  dec_year  sn_value  sn_error  obs_num  extra\n",
      "0  1818      1    1  1818.001        -1       NaN        0      1\n",
      "1  1818      1    2  1818.004        -1       NaN        0      1\n",
      "2  1818      1    3  1818.007        -1       NaN        0      1\n",
      "3  1818      1    4  1818.010        -1       NaN        0      1\n",
      "4  1818      1    5  1818.012        -1       NaN        0      1\n",
      "5  1818      1    6  1818.015        -1       NaN        0      1\n",
      "6  1818      1    7  1818.018        -1       NaN        0      1\n",
      "7  1818      1    8  1818.021        65      10.2        1      1\n",
      "8  1818      1    9  1818.023        -1       NaN        0      1\n",
      "9  1818      1   10  1818.026        -1       NaN        0      1\n",
      "Ending file:\n",
      "       year  month  day  dec_year  sn_value  sn_error  obs_num  extra\n",
      "72855  2017      6   21  2017.470        35       1.0       41      0\n",
      "72856  2017      6   22  2017.473        24       0.8       39      0\n",
      "72857  2017      6   23  2017.475        23       0.9       40      0\n",
      "72858  2017      6   24  2017.478        26       2.3       15      0\n",
      "72859  2017      6   25  2017.481        17       1.0       18      0\n",
      "72860  2017      6   26  2017.484        21       1.1       25      0\n",
      "72861  2017      6   27  2017.486        19       1.2       36      0\n",
      "72862  2017      6   28  2017.489        17       1.1       22      0\n",
      "72863  2017      6   29  2017.492        12       0.5       25      0\n",
      "72864  2017      6   30  2017.495        11       0.5       30      0\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import os\n",
    "  \n",
    "names = ['year', 'month', 'day', 'dec_year', 'sn_value' , \n",
    "         'sn_error', 'obs_num', 'extra']\n",
    "df = pd.read_csv(\n",
    "    \"https://data.heatonresearch.com/data/t81-558/SN_d_tot_V2.0.csv\",\n",
    "    sep=';',header=None,names=names,\n",
    "    na_values=['-1'], index_col=False)\n",
    "\n",
    "print(\"Starting file:\")\n",
    "print(df[0:10])\n",
    "\n",
    "print(\"Ending file:\")\n",
    "print(df[-10:])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "qzV1XGS0p2uA"
   },
   "source": [
    "As you can see, there is quite a bit of missing data near the end of the file.  We want to find the starting index where the missing data no longer occurs.  This technique is somewhat sloppy; it would be better to find a use for the data between missing values.  However, the point of this example is to show how to use a transformer encoder with a somewhat simple time series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "x2pQeFAzp2uA",
    "outputId": "61be0871-8395-4fc9-c121-15948194eb3b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "11314\n"
     ]
    }
   ],
   "source": [
    "# Find the last zero and move one beyond\n",
    "start_id = max(df[df['obs_num'] == 0].index.tolist())+1  \n",
    "print(start_id)\n",
    "df = df[start_id:] # Trim the rows that have missing observations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iNZcj-s6Vp2R"
   },
   "source": [
    "Divide into training and test/validation sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "SBaewsY_p2uB",
    "outputId": "6d9e72dd-e2c8-42af-ceaf-be56701276e6"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training set has 55160 observations.\n",
      "Test set has 6391 observations.\n"
     ]
    }
   ],
   "source": [
    "df['sn_value'] = df['sn_value'].astype(float)\n",
    "df_train = df[df['year']<2000]\n",
    "df_test = df[df['year']>=2000]\n",
    "\n",
    "spots_train = df_train['sn_value'].tolist()\n",
    "spots_test = df_test['sn_value'].tolist()\n",
    "\n",
    "print(\"Training set has {} observations.\".format(len(spots_train)))\n",
    "print(\"Test set has {} observations.\".format(len(spots_test)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hm9P7RnqYQzh"
   },
   "source": [
    "The **to_sequences** function takes linear time series data into an **x** and **y** where **x** is all possible sequences of seq_size. After each **x** sequence, this function places the next value into the **y** variable. These **x** and **y** data can train a time-series neural network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "ZaltDADPp2uB",
    "outputId": "7c35e2f9-674e-46eb-b89c-2d30a11e33bf"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Shape of training set: (55150, 10, 1)\n",
      "Shape of test set: (6381, 10, 1)\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "def to_sequences(seq_size, obs):\n",
    "    x = []\n",
    "    y = []\n",
    "\n",
    "    for i in range(len(obs)-SEQUENCE_SIZE):\n",
    "        #print(i)\n",
    "        window = obs[i:(i+SEQUENCE_SIZE)]\n",
    "        after_window = obs[i+SEQUENCE_SIZE]\n",
    "        window = [[x] for x in window]\n",
    "        #print(\"{} - {}\".format(window,after_window))\n",
    "        x.append(window)\n",
    "        y.append(after_window)\n",
    "        \n",
    "    return np.array(x),np.array(y)\n",
    "    \n",
    "    \n",
    "SEQUENCE_SIZE = 10\n",
    "x_train,y_train = to_sequences(SEQUENCE_SIZE,spots_train)\n",
    "x_test,y_test = to_sequences(SEQUENCE_SIZE,spots_test)\n",
    "\n",
    "print(\"Shape of training set: {}\".format(x_train.shape))\n",
    "print(\"Shape of test set: {}\".format(x_test.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pDjLR0RjYl5y"
   },
   "source": [
    "We can view the results of the **to_sequences** encoding of the sunspot data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "MNH3BG-jp2uB",
    "outputId": "fb8a2318-2da7-49e4-bdc7-d684e4d393e5"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(55150, 10, 1)\n"
     ]
    }
   ],
   "source": [
    "print(x_train.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jlmXgzbcZEHn"
   },
   "source": [
    "Next, we create the transformer_encoder; I obtained this function from a [Keras example](https://keras.io/examples/timeseries/timeseries_transformer_classification/). This layer includes residual connections, layer normalization, and dropout. This resulting layer can be stacked multiple times. We implement the projection layers with the Keras Conv1D."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "YZ5ltq1L397v"
   },
   "outputs": [],
   "source": [
    "from tensorflow import keras\n",
    "from tensorflow.keras import layers\n",
    "\n",
    "def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0):\n",
    "    # Normalization and Attention\n",
    "    x = layers.LayerNormalization(epsilon=1e-6)(inputs)\n",
    "    x = layers.MultiHeadAttention(\n",
    "        key_dim=head_size, num_heads=num_heads, dropout=dropout\n",
    "    )(x, x)\n",
    "    x = layers.Dropout(dropout)(x)\n",
    "    res = x + inputs\n",
    "\n",
    "    # Feed Forward Part\n",
    "    x = layers.LayerNormalization(epsilon=1e-6)(res)\n",
    "    x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation=\"relu\")(x)\n",
    "    x = layers.Dropout(dropout)(x)\n",
    "    x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)\n",
    "    return x + res"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Et2peKf3Z4H_"
   },
   "source": [
    "The following function is provided to build the model, including the attention layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "id": "HUAVPayh4MKc"
   },
   "outputs": [],
   "source": [
    "def build_model(\n",
    "    input_shape,\n",
    "    head_size,\n",
    "    num_heads,\n",
    "    ff_dim,\n",
    "    num_transformer_blocks,\n",
    "    mlp_units,\n",
    "    dropout=0,\n",
    "    mlp_dropout=0,\n",
    "):\n",
    "    inputs = keras.Input(shape=input_shape)\n",
    "    x = inputs\n",
    "    for _ in range(num_transformer_blocks):\n",
    "        x = transformer_encoder(x, head_size, num_heads, ff_dim, dropout)\n",
    "\n",
    "    x = layers.GlobalAveragePooling1D(data_format=\"channels_first\")(x)\n",
    "    for dim in mlp_units:\n",
    "        x = layers.Dense(dim, activation=\"relu\")(x)\n",
    "        x = layers.Dropout(mlp_dropout)(x)\n",
    "    outputs = layers.Dense(1)(x)\n",
    "    return keras.Model(inputs, outputs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "CkxS7ZpXc_Tf"
   },
   "source": [
    "We are now ready to build and train the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "ZzacEsJip2uB",
    "outputId": "5cb8a25d-1c8d-491e-ca8a-f27707289743"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1/200\n",
      "690/690 [==============================] - 25s 16ms/step - loss: 1919.4844 - val_loss: 463.2157\n",
      "Epoch 2/200\n",
      "690/690 [==============================] - 11s 16ms/step - loss: 1113.0945 - val_loss: 365.1375\n",
      "Epoch 3/200\n",
      "690/690 [==============================] - 11s 16ms/step - loss: 873.6814 - val_loss: 345.2026\n",
      "Epoch 4/200\n",
      "690/690 [==============================] - 11s 16ms/step - loss: 789.0035 - val_loss: 329.4594\n",
      "Epoch 5/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 762.3812 - val_loss: 324.1521\n",
      "Epoch 6/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 750.8315 - val_loss: 323.2135\n",
      "Epoch 7/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 744.6664 - val_loss: 326.0743\n",
      "Epoch 8/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 737.4108 - val_loss: 312.2708\n",
      "Epoch 9/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 719.6789 - val_loss: 307.9902\n",
      "Epoch 10/200\n",
      "690/690 [==============================] - 11s 16ms/step - loss: 715.9462 - val_loss: 296.0132\n",
      "Epoch 11/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 712.4185 - val_loss: 299.6908\n",
      "Epoch 12/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 709.5587 - val_loss: 297.3573\n",
      "Epoch 13/200\n",
      "690/690 [==============================] - 11s 16ms/step - loss: 703.4562 - val_loss: 293.3137\n",
      "Epoch 14/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 714.5865 - val_loss: 310.3690\n",
      "Epoch 15/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 706.6390 - val_loss: 304.2530\n",
      "Epoch 16/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 701.1292 - val_loss: 297.4577\n",
      "Epoch 17/200\n",
      "690/690 [==============================] - 10s 15ms/step - loss: 705.1274 - val_loss: 326.7051\n",
      "Epoch 18/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 696.4119 - val_loss: 290.4462\n",
      "Epoch 19/200\n",
      "690/690 [==============================] - 10s 15ms/step - loss: 701.0396 - val_loss: 324.6311\n",
      "Epoch 20/200\n",
      "690/690 [==============================] - 11s 17ms/step - loss: 694.8000 - val_loss: 304.3717\n",
      "Epoch 21/200\n",
      "690/690 [==============================] - 11s 16ms/step - loss: 697.7822 - val_loss: 314.0597\n",
      "Epoch 22/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 696.5428 - val_loss: 296.1778\n",
      "Epoch 23/200\n",
      "690/690 [==============================] - 11s 16ms/step - loss: 688.9620 - val_loss: 291.3384\n",
      "Epoch 24/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 692.5215 - val_loss: 294.7356\n",
      "Epoch 25/200\n",
      "690/690 [==============================] - 10s 15ms/step - loss: 695.5998 - val_loss: 309.7605\n",
      "Epoch 26/200\n",
      "690/690 [==============================] - 11s 16ms/step - loss: 688.1234 - val_loss: 289.6525\n",
      "Epoch 27/200\n",
      "690/690 [==============================] - 11s 16ms/step - loss: 680.7135 - val_loss: 287.5633\n",
      "Epoch 28/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 689.2556 - val_loss: 306.3144\n",
      "Epoch 29/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 686.9375 - val_loss: 294.5692\n",
      "Epoch 30/200\n",
      "690/690 [==============================] - 10s 15ms/step - loss: 689.2764 - val_loss: 295.0640\n",
      "Epoch 31/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 683.3184 - val_loss: 306.8054\n",
      "Epoch 32/200\n",
      "690/690 [==============================] - 11s 16ms/step - loss: 679.1677 - val_loss: 311.3470\n",
      "Epoch 33/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 683.5298 - val_loss: 292.4295\n",
      "Epoch 34/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 689.2239 - val_loss: 298.1823\n",
      "Epoch 35/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 682.8902 - val_loss: 297.4239\n",
      "Epoch 36/200\n",
      "690/690 [==============================] - 11s 15ms/step - loss: 679.1320 - val_loss: 289.7046\n",
      "Epoch 37/200\n",
      "690/690 [==============================] - 11s 16ms/step - loss: 673.3400 - val_loss: 297.0687\n",
      "200/200 [==============================] - 1s 5ms/step - loss: 214.5603\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "214.56031799316406"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "input_shape = x_train.shape[1:]\n",
    "\n",
    "model = build_model(\n",
    "    input_shape,\n",
    "    head_size=256,\n",
    "    num_heads=4,\n",
    "    ff_dim=4,\n",
    "    num_transformer_blocks=4,\n",
    "    mlp_units=[128],\n",
    "    mlp_dropout=0.4,\n",
    "    dropout=0.25,\n",
    ")\n",
    "\n",
    "model.compile(\n",
    "    loss=\"mean_squared_error\",\n",
    "    optimizer=keras.optimizers.Adam(learning_rate=1e-4)\n",
    ")\n",
    "#model.summary()\n",
    "\n",
    "callbacks = [keras.callbacks.EarlyStopping(patience=10, \\\n",
    "    restore_best_weights=True)]\n",
    "\n",
    "model.fit(\n",
    "    x_train,\n",
    "    y_train,\n",
    "    validation_split=0.2,\n",
    "    epochs=200,\n",
    "    batch_size=64,\n",
    "    callbacks=callbacks,\n",
    ")\n",
    "\n",
    "model.evaluate(x_test, y_test, verbose=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-jeJf5zsp2uB"
   },
   "source": [
    "Finally, we evaluate the model with RMSE."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "DqiV9Oq2p2uB",
    "outputId": "67b2df22-45b0-4deb-d069-0d34ff2b2ac3"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Score (RMSE): 14.647875946283007\n"
     ]
    }
   ],
   "source": [
    "from sklearn import metrics\n",
    "\n",
    "pred = model.predict(x_test)\n",
    "score = np.sqrt(metrics.mean_squared_error(pred,y_test))\n",
    "print(\"Score (RMSE): {}\".format(score))"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "anaconda-cloud": {},
  "colab": {
   "collapsed_sections": [],
   "name": "t81_558_class_10_5_keras_transformers.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3.9 (tensorflow)",
   "language": "python",
   "name": "tensorflow"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
